Model Selection

Zero-shot image classification

# Zero-shot image classification

OPENCLIP SigLIP Tiny 14 Distill SigLIP 400m Cc9m

A lightweight vision-language model based on the SigLIP architecture, extracting knowledge from the larger SigLIP-400m model through distillation techniques, suitable for zero-shot image classification tasks.

Image Classification

Clip Backdoor Vit B16 Cc3m Blto Cifar

This is a pre-trained model for researching backdoor sample detection in contrastive language-image pre-training, containing a specific backdoor trigger BLTO.

Text-to-Image English

Vit Gopt 16 SigLIP2 384

SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification

Vit SO400M 16 SigLIP2 512

SigLIP 2 vision-language model trained on WebLI dataset, suitable for zero-shot image classification tasks

Vit SO400M 16 SigLIP2 384

SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks.

Vit SO400M 16 SigLIP2 256

SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification

Vit SO400M 14 SigLIP2 378

SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks

Vit L 16 SigLIP2 512

SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks

Vit L 16 SigLIP2 256

SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification

Vit B 16 SigLIP2 512

A SigLIP 2 vision-language model trained on the WebLI dataset, supporting zero-shot image classification tasks

Vit B 16 SigLIP2 384

SigLIP 2 vision-language model trained on the WebLI dataset, suitable for zero-shot image classification tasks

Vit B 32 SigLIP2 256

SigLIP 2 vision-language model trained on WebLI dataset, supporting zero-shot image classification tasks

Vit B 16 SigLIP2 256

SigLIP 2 vision-language model trained on the WebLI dataset, supporting zero-shot image classification tasks

Siglip2 So400m Patch14 384

SigLIP 2 is a vision-language model based on the SigLIP pre-training objective, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.

Siglip2 So400m Patch14 224

SigLIP 2 is an improved multilingual vision-language encoder based on SigLIP, enhancing semantic understanding, localization, and dense feature extraction capabilities.

Siglip2 Large Patch16 512

SigLIP 2 is an improved model based on SigLIP, integrating multiple technologies to enhance semantic understanding, localization, and dense feature extraction capabilities.

CLIP ViT H 14 Laion2b S32b B79k

This is a vision-language model based on the OpenCLIP framework, trained on the LAION-2B English subset, excelling in zero-shot image classification and cross-modal retrieval tasks.

CLIP ViT B 32 Laion2b S34b B79k

A vision-language model trained on the LAION-2B English dataset based on the OpenCLIP framework, supporting zero-shot image classification and cross-modal retrieval

Eva Giant Patch14 Clip 224.laion400m

The EVA CLIP model is a vision-language model based on OpenCLIP and the timm framework, supporting zero-shot image classification tasks.

Eva02 Large Patch14 Clip 224.merged2b

The EVA CLIP model is a vision-language model based on OpenCLIP and timm model weights, supporting tasks such as zero-shot image classification.

Image Classification

Eva02 Enormous Patch14 Clip 224.laion2b Plus

EVA-CLIP is a large-scale vision-language model based on the CLIP architecture, supporting tasks such as zero-shot image classification.

Eva02 Enormous Patch14 Clip 224.laion2b

EVA-CLIP is a vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks.

Eva02 Base Patch16 Clip 224.merged2b

The EVA CLIP model is a vision-language model built on the OpenCLIP and timm frameworks, supporting tasks like zero-shot image classification.

Vit Huge Patch14 Clip Quickgelu 378.dfn5b

ViT-Huge image encoder based on CLIP architecture, trained on DFN5B dataset, supports quick GELU activation

Image Classification

Vit Huge Patch14 Clip 378.dfn5b

The visual encoder component of DFN5B-CLIP, based on ViT-Huge architecture, trained with 378x378 resolution images for CLIP model

Image Classification

Vit Base Patch16 Clip 224.dfn2b

Vision Transformer model based on CLIP architecture, featuring DFN2B-CLIP image encoder weights released by Apple

Image Classification

Vit Base Patch32 Clip 256.datacompxl

Vision Transformer model based on CLIP architecture, specialized in image feature extraction with support for 256x256 resolution input

Image Classification

Vit Base Patch32 Clip 224.datacompxl

Vision Transformer model based on CLIP architecture, designed for image feature extraction, trained using the DataComp XL dataset

Image Classification

Vit Base Patch16 Clip 224.datacompxl

A vision Transformer model based on the CLIP architecture, specifically designed for image feature extraction, using ViT-B/16 structure and trained on the DataComp XL dataset

Image Classification

Convnext Xxlarge.clip Laion2b Soup

ConvNeXt-XXLarge image encoder based on the CLIP framework, trained by LAION, suitable for multimodal tasks

Image Classification

Vit Huge Patch14 Clip 224.metaclip Altogether

CLIP model based on ViT-Huge architecture, supporting zero-shot image classification tasks

Image Classification

Longclip SAE ViT L 14

A Long-CLIP model fine-tuned with Sparse Autoencoder (SAE), supporting long-text input and optimized for text-image alignment

LLM2CLIP EVA02 L 14 336

LLM2CLIP is an innovative approach that enhances CLIP's visual representation capabilities through large language models (LLMs), significantly improving cross-modal task performance

Vit Gigantic Patch14 Clip 224.metaclip 2pt5b

A dual-framework compatible vision model trained on MetaCLIP-2.5B dataset, supporting both OpenCLIP and timm frameworks

Image Classification

Vit Huge Patch14 Clip 224.metaclip 2pt5b

A dual-purpose vision-language model trained on the MetaCLIP-2.5B dataset, supporting zero-shot image classification tasks

Image Classification

Vit Large Patch14 Clip 224.metaclip 2pt5b

A dual-framework compatible vision model trained on MetaCLIP-2.5B dataset, supporting zero-shot image classification tasks

Image Classification

Vit Large Patch14 Clip 224.metaclip 400m

Vision Transformer model trained on MetaCLIP-400M dataset, supporting zero-shot image classification tasks

Image Classification

Vit Large Patch14 Clip 224.laion400m E32

Large Vision Transformer model trained on LAION-400M dataset, supporting zero-shot image classification tasks

Image Classification

Vit Base Patch16 Clip 224.laion400m E31

Vision Transformer model trained on LAION-400M dataset, supporting zero-shot image classification tasks

Image Classification

Vit Base Patch32 Clip 224.metaclip 2pt5b

A vision Transformer model trained on the MetaCLIP-2.5B dataset, compatible with both open_clip and timm frameworks

Image Classification

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase